Data Manipulation with dplyr

Masumbuko Semba & Nyamisi Peter

2024-05-03

Learning Agenda

  1. Get familiar with R and Rstudio
  2. Data structure and data types
  3. Reading and writing data in Rstudio
  4. Tidying with tidyverse
  5. Plotting and Visualization
  6. Descriptive Statistics
  7. Data manipulation with tidyverse
  8. Inferential Statistics
  9. Modelling and simulation
  10. Spatial Handling and Analysis

Data Manipulation with dplyr

Loading a package

  • We rely on tidyverse for data manipulation
  • tidyverse is an ecosystem of packages
  • Has dplyr package, which is dedicated for data manipulation
  • tidyverse does not load by default, we need to load it
  • It brings dplyr and other needed packages

Loading tidyverse

require(tidyverse)
Loading required package: tidyverse
Warning: package 'ggplot2' was built under R version 4.3.3
Warning: package 'tidyr' was built under R version 4.3.3
Warning: package 'readr' was built under R version 4.3.3
Warning: package 'purrr' was built under R version 4.3.1
Warning: package 'dplyr' was built under R version 4.3.2
Warning: package 'stringr' was built under R version 4.3.2
Warning: package 'lubridate' was built under R version 4.3.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Import the data

  • Import the chinook dataset into our session
lfq = readxl::read_excel("optimal/posts/data/LFQ_sample_1.xls")
  • Have a glimpse of the loaded datase
lfq |> 
  glimpse()
Rows: 779
Columns: 6
$ Date         <dttm> 2019-05-30, 2019-05-30, 2019-05-30, 2019-05-30, 2019-05-…
$ Species      <chr> "Siganus sutor", "Siganus sutor", "Siganus sutor", "Sigan…
$ `Size (cm)`  <dbl> 16.5, 12.0, 15.0, 33.0, 11.2, 17.0, 19.0, 10.0, 11.0, 9.0…
$ `Size Class` <dbl> 18, 12, 15, 33, 12, 18, 21, 12, 12, 12, 12, 12, 12, 12, 1…
$ `Gear type`  <chr> "Speargun", "Speargun", "Speargun", "Speargun", "hook and…
$ Landing_site <chr> "Pwani", "Pwani", "Pwani", "Pwani", "Pwani", "Pwani", "Pw…

Basic Verbs

  • dplyr has set of functions for data manipulation
  • These functions are called verbs
  • Similar to verbs in English language,
  • dplyr is also called grammar of data manipulation
  • filter
  • slice
  • select
  • mutate
  • arrange
  • summarise
  • group_by

Thank You for Attending

Acknowledgments

I am grateful for the insightful comments offered by the anonymous peer reviewers at Books & Texts. The generosity and expertise of one and all have improved this study in innumerable ways and saved me from many errors; those that inevitably remain are entirely my own responsibility.